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AMENDMENTS TO THE SPECIFICATION; 

Replace the Abstract with the following: 

A computer includes a processor; a memory system; and a co-processing unit with an 
associated a plurality of data registers for data exchange. The computer is controlled to 
implement a method of increasing efficiency in executing a matrix operation that uses matrix 
data in standard format, the standard format being one of column major format and row major 
format. The matrix operation is executed in the co-processing unit. For matrix data stored in 
the standard format in the memory system, wherein the matrix data is data of any of a 
complete matrix, a complete submatrix, or a part of a matrix or submatrix, using the 
processor to separate the matrix data into blocks of data, each block having a size p-by-q. 
The processor rearranges the blocks to be contiguous data and places the blocks in the 
memory system of the computer for retrieval in a repetitive manner for executing the matrix 
operation. The data within the blocks retain the original matrix data content but the blocks 
are moved to be in an ordering different from the original ordering of the blocks within the 
matrix, such that the matrix data is represented in a format that permits the matrix data to be 
moved from the memory system into a position in the plurality of data registers for 
performing the matrix operation more quickly than if the matrix data had been moved as 
stored in the standard format. 
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Replace the text beginning at line 18 of page 4 through line 1 1 of page 6 by the 
following four paragraphs: 

In a first exemplary aspect of the present invention, described herein is a computer 
including a processor; a memory system; and a co-processing unit with an associated a 
plurality of data registers for data exchange. The computer is controlled to implement a 
method of increasing efficiency in executing a matrix operation that uses matrix data in a 
standard format, the standard format being either column major format or row major format, 
and the matrix operation is executed in the co-processing unit. The method includes, for 
matrix data stored in the standard format in the memory system, wherein the matrix data 
comprises data of any of a complete matrix, a complete submatrix, or a part of a matrix or 
submatrix, using the processor to separate the matrix data into blocks of data, each block 
having a size p-by-q, and rearranging, by the processor and placing in (he memory system of 
the computer, for retrieval in a repetitive manner for executing the matrix operation, the 
blocks of data to be contiguous data. The data within the blocks retain an original matrix data 
content but the blocks are moved to be in an ordering different from an original ordering of 
the blocks within the matrix, such that the matrix data is represented in a format that permits 
the matrix data to be moved from the memory system into a position in the plurality of data 
registers for performing the matrix operation more quickly than if the matrix data had been 
moved as stored in the standard format. 

In a second exemplary aspect of the present invention, also described herein is a 
computer including a processor, a storage, and a co-processing unit, the computer configured 
to implement a method of increasing efficiency in executing a matrix operation that uses 
matrix data in a standard format, the standard format being either column major format or 
row major format. The method includes converting, by the processor, at least a part of the 
matrix data into a new or optimal matrix format being contiguous data that no longer 
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represents the matrix data in the standard format, the optimal matrix format comprising a 
representation of a subset of the matrix data that is predetermined to pennit a loading of the 
matrix data from the storage into the co-processing unit optimally to perform the matrix 
operation in a minimal time in the processing unit. The optimal matrix format comprising a 
re-arrangement of blocks of the matrix data wherein data within each block retains its original 
values. A selected block of matrix data is then repetitively loaded in the optimal matrix 
format into the co-processing unit for correctiy executing the matrix operation. 

In a third exemplary aspect of the present invention, also described herein is a 
computer including a processor, a storage, and a co-processing with an associated plurality of 
data registers for data exchange, the computer having at least one of a machine architecture 
and an instruction set having one or more features that are less than optimal for executing a 
matrix operation, thereby causing a disadvantage in processing data for the matrix operation. 
The computer is configured to implement a method of overcoming the disadvantage by 
software instructions, the method including rearranging, by the processor, at least a part of 
matrix data to be used in the matrix operation into a plurality of blocks, each block having 
size p-by-q, such that the matrix data is no longer stored in a standard matrix format being 
either row major format or column major format. The rearranged matrix data in blocks is. 
stored in the storage as contiguous blocks of contiguous data in a new format such that an 
original content of data within the blocks is retained but an ordering of the blocks is changed. 
The new format of matrix data is predetermined to allow the matrix data to be placed from 
the storage into the co-processing unit for processing the matrix data in the matrix operation 
such that the disadvantage on the computer is overcome and the matrix processing will be 
correctly executed. The matrix data is repetitively loaded in the new format from the storage 
into at least a subset of the data registers of the co-processing unit in a new or optimal format 
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that allows a minimal possible time to get data into the processing unit to utilize the matrix 
data in the matrix operation. 

In a fourth exemplary aspect of the present invention, also described herein is a 
computer including a processor, a storage, and a co-processing unit with an associated 
plurality of data registers for data exchange. The computer is configured to implement a 
method of overcoming a hardware disadvantage on tehe computer relative to a specific 
processing on a specific computer architecture/set of instructions using the co-processing 
unit, the hardware disadvantage reducing an efficiency of the specific processing. The 
method includes using first software instructions to preliminarily process input data by the 
processor in a manner to generate a first error relative to the specific processing, the first 
error being a conversion of the input data into a predetermined new format of input data. 
Second software instructions are used to subsequently process the input data in the new 
format in a manner to generate a correcting error relative to the specific processing, said 
correcting error including loading the input data into the plurality of data registers in a new 
word order of the input data. The first software instructions, in combination with the second 
software instructions, overcomes the disadvantage and computes a correct result. The 
specific processing involves a matrix operation, the disadvantage involves a loading of matrix 
data from the storage into the co-processing unit that causes a non optimal processing of the 
matrix data in the matrix operation. The first error includes storing the matrix data in the 
storage in a format that converts the matrix data from standard column major or row major 
format into a new format predetermined to overcome the disadvantage when the data is 
subjected to the correcting error, such that an original content of data within the blocks is 
retained but an ordering of the blocks is changed. The correcting error includes loading the 
data in the new format from the storage into the plurality of data registers using a loading 
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format that involves a ndn standard word order of the matrix data, permitting the loading to 
be done optimally and the matrix processing to be done correctly. 
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